Python与爬虫基础

发表于 2018-07-30 | 更新于 2018-08-03

python基础

1.Python有6种标准数据类型

数字

布尔（Boolean）

字符串（string）

列表（List）

元组（tuple）

字典（dict）

2.Python支持4种数字类型

int（整形）

long（长整型，python3取消，变为int）

float（浮点型）

complex（复数）

3.输入与输出

print不换行: print(x, end=" ")

input("请输入x值：")

python保留字

4.数据结构 List

lst = [ 'abcd', 786 , 2.23, 'john', 70.2 ]
print(lst)
print(lst[2])
# 向后添加元素
lst.append('hello')
print(lst)
# 删除最后一个元素
lst.pop()
print(lst)
# 删除内容为'abcd'的元素
lst.remove('abcd')
print(lst)
list_ = [2,3,5,1,4,6]
# 按数字大小进行排序
list_.sort()
print(list_)

#列表的便利
list_ = [1,2,3,4]
for i in list_:
    print(i)

i = 0
while i < len(list_):
    print(list_[i])
    i+=1

元组 tuple，与列表基本相同，区别是使用()而不是[],而且元组是只读的

1
2
3

tp = ( 'abcd', 786 , 2.23, 'john', 70.2  )
print(tp)
print(tp[2])

切片字符串和列表都存在切片操作

a = "abcdef"
b = ['a','b','c','d','e','f']
#len方法可以计算字符串、列表的长度
len_ = len(a)
print(len_)
# 完整格式
print(a[0:6:1])
print(b[0:6:1])
# 简写格式
print(a[1:4])
print(b[1:4])
# 设置步进,不写默认最前到最后
print(a[::2])
print(b[::2])
# 逆步进(倒序)
print(a[::-1])
print(b[::-1])
# 前五个2步进
print(a[0:5:2])
print(b[0:5:2])
# 前3个后3个
print(a[:-3])
print(a[-3:])

字典字典是另一种可变容器模型，且可存储任意类型对象。格式如下所示：d = {key1 : value1, key2 : value2 } 键必须是唯一的，但值则不必。值可以取任何数据类型，但键必须是不可变数据类型，如字符串，数字或元组。

dict = {'Name': 'Runoob', 'Age': 7, 'Class': 'First'}

# 获取元素
name1 = dict['Name']
# get第二个参数默认是None
name2 = dict.get('Name')
print(name1)
print(name2)
# 添加元素
dict['Address'] = 'beijing'
print(dict)
# 修改元素，直接用键值覆盖
dict['Name'] = 'xiaotian'
print(dict)
# 删除字典元素
del dict['Address']
print(dict)

# 字典的遍历
dict_ = {'Name': 'Runoob', 'Age': 7, 'Class': 'First'}
# 遍历keys(键)
for i in dict_.keys():
    print(i)
print('---------------------------')
# 遍历values(值)
for i in dict_.values():
    print(i)
print('---------------------------')
# 遍历items(键、值)
for i in dict_.items():
    print(i)
for key,value in dict_.items():
    print (key,end='-')
    print (value)

函数

在Python之中定义函数要遵循以下几点：

python中定义函数使用 def 关键字，后面接着函数名称和括号(),最后跟着冒号

函数的参数应当写在括号之中，也可以在括号中定义参数

函数中的 docstring 是可选的，可写可不写，建议写上

函数中的代码块要进行缩进

return 可以用于返回函数返回值或者表达式，return后面什么都不写与 return None 或者不写return 等价

1.参数

调用参数的时候可以使用以下几种参数：

必要(required)参数

关键(keyword)字参数

默认(default)参数

变长(variable-length)参数

必要参数是指必须按照函数定义的参数列表的定义，按照顺序传入，如change_list函数就必须传入mylist参数,如果没有按照要求进行传递参数，将会报错：

1 2	def change_list(mylist): mylist.append('test')

当使用关键字参数对函数进行调用的时候，要按照函数定义的参数名进行赋值，这样允许不按照函数定义的参数顺序进行赋值，因为Python解释器这是可以将参数与传入变量进行对应。

def test(a, b):
    print(a)
    print(b)
test(b=2, a=1)

默认参数指的是当参数的值没有在调用时指定的话，函数将为参数赋值一个预先定义好的值，默认参数的位置应当位于参数列表的最后：

def test(a, b=5):
    print(a)
    print(b)
test(1)
test(1, 2)

当函数需要处理的变量的数目不确定的时候可以使用变长参数，变长参数在函数定义的时候名字没有预先确定，在定义函数的时候在变量前面增加一个星号（*）表示这是一个变长变量：

# *args接收单个参数
# **kwargs接收字典型键值对
def printinfo(*args,**kwargs):
    print(args)
    print(kwargs)
printinfo(1,'asdas',name='asd',age=12)

2.全局变量与局部变量

在函数体中定义的变量具有局部作用域，他们只能在定义他们的函数之内被访问；全局变量可以在程序内被所有的函数访问。

# 如果在局部使用全局变量，不加global视为在局部新建一个变量，与原变量地址空间不同。
a = 50
def test():
    a = 100
    print(a)
test()
print(a)

# 添加global，标记为全局变量
a = 50
def test():
    global a
    a = 100
    print(a)
test()
print(a)

a = 50
def test():
    num = a
    print(num)
test()
print(a)

3.参数传递

python中参数的传递都是引用的传递，传递的是地址空间，也可以说传递的是变量所占的地址空间

def change_list(mylist,str_):
    mylist.append('test')
    print('Inside function {}'.format(mylist))
    # a指向传过来的str_的地址空间
    str_ = str_ + 'def'
    print(str_)
lst = [1, 2, 3]
b = 'abc' 
change_list(lst,b)
# 经过函数的处理，lst最后增加了一个元素
# 经过函数的处理，a,b都指向'abc'的地址空间
print(lst)
print(b)

4.匿名函数

匿名函数的定义方法与前面讲的传统的使用def的方法是不同，使用lambda关键词定义匿名函数。

匿名函数可以接收任意数目的参数，但是返回值只能是一个值或者表达式，他们不能包含命令或者多个表达式。

lambda 函数具有自己的命名空间，他们除了参数列表中的变量以及全局变量之外无法访问其他的变量。
lambda 函数无法直接被用来打印变量，lambda 函数需要一个表达式
1
2
3
sm = lambda x, y: x + y
res = sm(1, 2)
print(res)

文件操作

1.open函数

open函数被用来打开或者创建文件，它将会返回一个file对象。

1	file object = open(file_name [, access_mode][, buffering][,encoding])

有几种常见的模式：

r 表示以只读方式打开（默认模式）

rb 表示以二进制只读打开

r+ 以读写方式打开

rb+ 以二进制读写方式打开

w 以只写方式打开（如果已有文件，将会覆盖，如果没有将会创建）

wb 以二进制只写打开（如果已有文件，将会覆盖，如果没有将会创建）

w+ 以读和写的方式打开一个文件（如果已有文件，将会覆盖，如果没有将会创建）

wb+ 以二进制方式读和写的方式打开一个文件（如果已有文件，将会覆盖，如果没有将会创建）

a 表示 append，追加模式，如果文件存在的话，指针将位于文件的末尾，如果没有文件的话将会创建一个

ab 以二进制格式追加模式打开一个文件

a+ 以追加和读的方式打开一个文件

ab+ 以追加和读的方式打开二进制格式文件

buffer是读写的缓冲区间，有可以选三种值: 0 表示不适用buffer，1 表示 buffer 文件中的一行，其余正整数表示buffer的字节数

encoding是指明对文件编码，仅适用于文本文件。如果不明编码方式，默认是使用locale.getpreferredencoding()函数返回的编码方式。

2.file对象属性

file.close 返回文件是否关闭

file.mode 返回打开文件的模式
file.name 返回文件的名字
1
2
3
4
5
fo = open("foo.txt", "wb")
print ("Name of the file: ", fo.name)
print ("Closed or not : ", fo.closed)
print ("Opening mode : ", fo.mode)
fo.close()

close()方法 file对象的close()方法用于将没有写入文件的信息写入文件以后关闭文件，在打开文件以后要记得关闭文件。

3.read()和write()

file 对象可以使用 read() 对文件的内容进行读取，在使用read()对文件进行读取的时候会一次性的读取文件的全部内容。还可以使用readline()，每次读取文件的一行，readlines()则可以一次读取文件中的全部行，并且返回一个列表。

write() 文件可以用于将内容写入文件，在写完文件之后一定要记得使用close()方法保证对于文件的更改已经写入了硬盘。

在对文件进行读写操作的时候，常用的一种方式是使用with，这样Python在对文件读写完成后会自动调用close()方法:

fo = open("foo.txt", "w+")
fo.write("hello world")
fo.close()
fo = open("foo.txt", "r")
t = fo.read()
print(t)

with open("foo.txt", "w+") as f:
    for i in range(10):
        f.write(str(i) + '\n')
with open("foo.txt", "r") as f:
    print(f.readlines())

Python 进阶

1.生成器

通过列表生成式，我们可以直接创建一个列表。但是，受到内存限制，列表容量肯定是有限的。而且，创建一个包含100万个元素的列表，不仅占用很大的存储空间，如果我们仅仅需要访问前面几个元素，那后面绝大多数元素占用的空间都白白浪费了。所以，如果列表元素可以按照某种算法推算出来，那我们是否可以在循环的过程中不断推算出后续的元素呢？这样就不必创建完整的list，从而节省大量的空间。

在Python中，这种一边循环一边计算的机制，称为生成器generator。

# 这是一个列表生成式
L = [ x*2 for x in range(5)]
# 最简单的生成器写法
G = ( x*2 for x in range(5))
print(type(L))
print(type(G))
print(next(G))
print(next(G))
print(next(G))
# send输入高级用法
print(G.send(None))

经常我们的生成的生成器不会使用next()，而是直接使用for循环遍历，而生成器的写法也就是将函数的return变成yield，我们后面的AI学习中可能遇到很大的数据量，因此生成器是我们必不可少的工具

# 通过生成器实现斐波那契数列
def fib(times):
    n = 0
    a,b = 0,1
    while n<times:
        yield b
        a,b = b,a+b
        n+=1

2. 迭代器

迭代是访问集合元素的一种方式。迭代器是一个可以记住遍历的位置的对象。迭代器对象从集合的第一个元素开始访问，直到所有的元素被访问完结束。迭代器只能往前不会后退。 1、可迭代对象可以直接作用于 for 循环的数据类型有以下几种：

一类是集合数据类型，如 list 、 tuple 、 dict 、 set 、 str 等；

一类是 generator ，包括生成器和带 yield 的generator function。

这些可以直接作用于 for 循环的对象统称为可迭代对象： Iterable 迭代器都是可迭代对象 2、迭代器可以被next()函数调用并不断返回下一个值的对象称为迭代器：Iterator。 生成器都是迭代器 3、iter()函数生成器都是 Iterator 对象，但 list 、 dict 、 str 虽然是 Iterable ，却不是 Iterator 。把 list 、 dict 、 str 等 Iterable 变成 Iterator 可以使用 iter() 函数：

# list 是可迭代对象
a = [1,2,3,4]
for i in a:
    print(i)

3.装饰器

装饰器是由python闭包系统实现的，本质上，就是一个返回函数的高阶函数，请看：

from time import ctime
def timefun(func):
    def wrappedfunc(*args,**kwargs):
        print("%s called at %s"%(func.__name__, ctime()))
        return func(*args,**kwargs)
    return wrappedfunc

@timefun
def test(a,b):
    print(a+b)

test(1,2)

# 通过装饰器验证登陆 只有输入1才可以运行
def login(func):
    def wfunc(*args,**kwargs):
        pop = input('输入编号')
        if pop == '1':
            print('欢迎登陆')
            return func(*args,**kwargs)
        else:
            print('陌生访问')
    return wfunc

@login
def test(a,b):
    print(a+b)

test(1,2)

错误处理

1.异常处理

在python中对异常进行处理时经常使用以下结构：

try:
    pass
except error0:
    pass
except error1:
    pass
finally:
    pass

try后面跟着需要执行的代码块，except后面标注异常的类型，在出现对应的异常的时候执行对应代码块，无论程序是否正确执行finally后面接着的代码块都会在最后执行。Python 中内置的异常类型，参考这个连接。

2.Assert语句

assert使用方法：

1	assert Expression[, Expression]

assert 将会判断后面跟的表达式是否为真，如果为false的话将会抛出AssertionError异常，经常可以用于对变量类型，或者程序逻辑进行检查，比如我们写一个计算两个向量内积的程序：

def mul(x, y):
    assert len(x) == len(y), "size of 2 vectors should be same"
    t = 0
    for i in range(len(x)):
        t += x[i] * y[i]
    return t

Python 爬虫基础、技巧

1.JSON

JSON (JavaScript Object Notation) 是一种轻量级的数据交换格式,是前后端交互中常用的数据格式，它与python内建的字典很相似，可以相互转换，然后供我们提取数据 python3 中可以使用 json 模块来对 JSON 数据进行编解码，它包含了两个函数：

1 2	json.dumps(): 对数据进行编码,也就是字典变为json. json.loads(): 对数据进行解码,也就是json变为字典.

json.dumps()中可以将ensure_ascii 设为 False可以避免一些不必要的问题

import json
# 一个标准的python字典
data = {
    'no' : 1,
    'name' : 'Runoob',
    'url' : 'http://www.runoob.com'
}
# 将字典转换成json
json_str = json.dumps(data,ensure_ascii=False)

print(json_str)
print(type(json_str))

# 一串json数据
json_str = '{"name": "Runoob", "no": 1, "url": "http://www.runoob.com"}'
# 将json转为字典
dict_ = json.loads(json_str)
print(dict_)
print(dict_['name'])
print(type(dict_))

2.Requests库

Requests is an elegant and simple HTTP library for Python, built for human beings. 简单的请求模式:

# get请求
response = requests.get(url, params={}, headers={}, cookie={}, proxies={}, verify=False)
# post请求
response = requests.post(url, data={}, headers={}, cookie={}, proxies={}, verify=False)

response的方法:

response.status_code #响应状态码
response.content #字节方式的响应体，会自动为你解码 gzip 和 deflate 压缩
response.text #字符串方式的响应体，会自动根据响应头部的字符编码进行解码
response.json() #Requests中内置的JSON解码器
response.headers #响应的响应行
response.request.headers #请求的请求行

我们可以通过Cookie或者Session进行模拟登陆，获取登录后的数据。

# Cookie进行模拟登陆
import re
import requests

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36','Cookie':'anonymid=jd5aevsb-2nuy7n; _r01_=1; __utmz=151146938.1520997522.1.1.utmcsr=baidu|utmccn=(organic)|utmcmd=organic; __utma=151146938.942267969.1520997521.1520997521.1520997521.1; depovince=BJ; _de=84C13AEF64FC380CAE794CB9CCA086D2; ln_uact=18610864225; ln_hurl=http://head.xiaonei.com/photos/0/0/men_main.gif; jebe_key=59aeee30-3816-4484-9881-4f6690bdce3f%7C100ad246dd59063859de7b1ea3f6ad38%7C1526957380310%7C1%7C1526957382140; jebecookies=74f2005f-ba3d-4a61-a1d0-19948a4d8970|||||; JSESSIONID=abcD_hI_-8WDRf9JK4fow; ick_login=cc4c902c-a43d-4481-8cb3-04ecdc2b9c2f; p=9a54e6619286a17266723fc6b08cc5f04; first_login_flag=1; t=97663cf647a6e85e1bd5d13636b0a2724; societyguester=97663cf647a6e85e1bd5d13636b0a2724; id=963865664; xnsid=7a1d85fa; loginfrom=syshome; wp_fold=0'}
url = 'http://www.renren.com/963865664/profile'
resp = requests.get(url,  headers=headers)
html = resp.content.decode()
print(re.findall(r"李**",html))

# Session进行模拟登陆
import requests
import re
url = "http://www.renren.com/PLogin.do"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
data = {
    'email':'186****4225',
    'password':'q****6'
}
session_my = requests.session()
session_my.post(url, data=data, headers=headers)
html = session_my.get('http://www.renren.com/963865664/profile', headers=headers).content.decode()
print(re.findall(r"李**",html))

3.Xpath

XPath 是一门在 XML 文档中查找信息的语言。XPath 用于在 XML 文档中通过元素和属性进行导航。 lxml库：

导入lxml的etree库

利用etree.HTML，将字符串转化为Element对象

Element对象具有xpath的方法

lxml 可以自动修正 html 代码

看一个实例

ht = """
<?xml version="1.0" encoding="ISO-8859-1"?>

<bookstore>·

    <book category="COOKING">
      <title lang="en">Everyday Italian</title>
      <author>Giada De Laurentiis</author>
      <year>2005</year>
      <price>30.00</price>
    </book>

    <book category="CHILDREN">
      <title lang="en">Harry Potter</title>
      <author>J K. Rowling</author>
      <year>2005</year>
      <price>29.99</price>
    </book>

    <book category="WEB">
      <title lang="en">XQuery Kick Start</title>
      <author>James McGovern</author>
      <author>Per Bothner</author>
      <author>Kurt Cagle</author>
      <author>James Linn</author>
      <author>Vaidyanathan Nagarajan</author>
      <year>2003</year>
      <price>49.99</price>
    </book>

    <book category="WEB">
      <title lang="en111">Learning XML</title>
      <author>Erik T. Ray</author>
      <year>2003</year>
      <price>39.95</price>
    </book>

</bookstore>
"""

from lxml import etree
xml = etree.HTML(ht) 
# 由于etree将xml修正，根节点已经不是bookstore，因此我们采用//
# 注意xpath返回的内容都是列表
# 选取所有 title的文本
titles = xml.xpath('//bookstore/book/title/text()')
print(titles)
# 选取第一个 title的文本
one_title = xml.xpath('//bookstore/book[1]/title/text()')
print(one_title)

# 选取价格大于35的book的year信息
year = xml.xpath('//bookstore/book[price>35]/year/text()')
print(year)

# 选取book category="CHILDREN"的书的price
price = xml.xpath('//bookstore/book[@category="CHILDREN"]/price/text()')
print(price)

# 选取book category=“WEB”的书的title的lang属性
lang = xml.xpath('//bookstore/book[@category="WEB"]/title/@lang')
print(lang)

# xpath如果不去选择属性和txt，会返回Element对象，可以对Element对象再次xpath
aaa = xml.xpath('//bookstore/book[@category="WEB"]/title')
print(aaa)
for i in aaa:
    bbb = i.xpath('./@lang')
    print(bbb)

参考资料

Requests

Xpath菜鸟教程

Xpath W3school

4.反反爬

反反爬虫措施，就是反反爬，当然还有反反反爬虫。反反爬虫措施有以下几种:

降低访问频率(在requests前加time.sleep())

模拟请求头

模拟登陆

使用代理IP

携带cookie

设置Referer header(躲过referer检测)

逆向js(解密别人的反爬手段，这个比较高级)

leetcode_链表

发表于 2018-07-22 | 更新于 2018-09-22 | 分类于 leetcode

链表

1、链表是否有环（141. Linked List Cycle）

// 快慢节点，快的一次走两步，慢的一次走一步，快的始终比慢的快，当两者相等的时候，说明有环
public boolean hasCycle(ListNode head) {
    if(head == null) return false;
    ListNode fast = head;
    ListNode slow = head;
    while(fast.next != null){
        fast = fast.next;    //走两步
        if(fast.next == null){
            return false;
        }
        fast = fast.next;
        slow = slow.next;    // 走一步
        
        if(fast == slow){
            return true;
        }
    }
    return false;
}

2、链表环的入口（142. Linked List Cycle II）

// 只要证明相等的时候，从head走到入口和fast继续走到入口的距离相等
// 第一次相遇的时候： S_fast = |OA| + |AB| + |BA| + |AB| = x+y+z+y
// S_slow = |OA| + |AB| = x+y, S_fast = 2*S_slow,所以z=x
// 即： slow从头开始走和fast从相遇位置走，距离一样，一样的节点就是入口
public ListNode detectCycle(ListNode head) {
    if(head == null) return null;
    ListNode fast = head;
    ListNode slow = head;
    boolean flag = false;
    while(fast.next !=null){
        fast = fast.next;
        if(fast.next == null){
            return null;
        }
        fast = fast.next;
        slow = slow.next;
        if(fast == slow){
            slow = head;
            flag = true;
            break;
        }
    }
    if(flag){
        while(slow != fast){
            slow = slow.next;
            fast = fast.next;
        }
        return slow;
    }
    return null;
}

3、数组中找到重复的数字（287. Find the Duplicate Number）

题目：1--n的数放入n+1长的数组，肯定有一个重复的数字

Input: [1,3,4,2,2]
Output: 2
Input: [3,1,3,4,2]
Output: 3

// 方案一：二分法
/*
* 在区间[1, n]中搜索，首先求出中点mid，然后遍历整个数组，统计所有小于等于mid的数的个数，如果个数大于mid，则说明重复值在[mid+1, n]之间，反之，重复值应在[1, mid-1]之间，然后依次类推，直到搜索完成，此时的low就是我们要求的重复值
*/
public int findDuplicate(int[] nums) {
    int l = 0;
    int r = nums.length -1;
    while(l < r){
        int count = 0;
        int mid = l + (r-l)/2;
        for(Integer a:nums){
            if(a <= mid) count++;
        }
        if(count <= mid) l = mid + 1;
        else r = mid;
    }
    return l;
}

//方案二，类似链表有环的情况，如果有重复的元素，就一定有环。
/*
 * 从数组n+1个元素映射到从1-n的数，换检测定义，给定一个函数，和序列，定义如下：
 * x_0  = k
 * x_1 = f(x_0)
 * x_2 = f(x_1)
 * ...
 * x_(n+1) = f(x_n)
 * 假设函数f从定义域映射到它本身，此时会有3种情况。首先，如果定义域是无穷的，则序列是无限长并且没有循环的。例如，函数 f(n) = n + 1，在整数范围内满足这个性质 - 没有数字是重复的。 第二， 序列可能是一个闭合循环，这意味着存在一个i使得x_0 = x_i。在这个例子中，序列在一组值内无限循环。第三，序列有可能是的“ρ型的”，此时序列看起来像下面这样：
   x_0 -> x_1 -> ... x_k -> x_{k+1} ... -> x_{k+j}
                         ^                       |
                         |                       |
                         +-----------------------+
* 也就是说，序列从一列链条型的元素开始进入一个环，然后无限循环。我们将环的起点称为环的“入口”。
* 从x_0开始： slow = f(slow)
              fast = f(f(fast))
  当 slow==fast时。在slow=x_0，然后fast和slow每次只走一步，当再次相等时，入口元素即是重复元素
*/
public int findDuplicate(int[] nums) {
    if(nums.length == 0) return 0;
    int fast = 0;
    int slow = 0;
    int t = 0;
    while(true){
        slow = nums[slow];
        fast = nums[nums[fast]];
        if(slow == fast){
            break;
        }
    }
    while(true){
        slow = nums[slow];
        t = nums[t];
        if(slow == t){
            break;
        }
    }
    return slow;        
}

4、删除链表中重复元素，重复的元素全部删除（82. Remove Duplicates from Sorted List II）

/*
* 递归解法
*/
public ListNode deleteDuplicates(ListNode head) {
    if(head == null || head.next == null) return head;
    ListNode next = head.next;
    // 处理头结点和next节点值一样
    if(head.val == next.val){
        while(next != null && head.val == next.val){
            next = next.next;
        }
        return deleteDuplicates(next);
    }else{
        // 处理头结点和next节点值不一样
        head.next = deleteDuplicates(head.next);
        return head;
    }
} 

/**
* 非递归解法，增加一个pre节点，增加pre初始节点first，则first.next = head
*/
public ListNode deleteDuplicates(ListNode head) {
    if(head == null || head.next == null) return head;
    // 增加一个节点表示 head的pre节点
    ListNode first = new ListNode(-1);
    first.next = head;
    // 分别赋值两个节点，一个当前节点，一个pre节点
    ListNode cur = head;
    ListNode pre = first;
    //循环条件，当前节点不为null，下一个节点不会null
    while(cur !=null && cur.next != null){
        // 定义next节点，用来和cur节点比较
        ListNode next = cur.next;
        if(cur.val == next.val){
            // 继续往后试探，表示 cur 到 next的值都 相同
            while(next != null && cur.val == next.val){
                next = next.next;
            }
            // 这个点比较难思考，因为所有相同的节点都删除，所以pre指向 next，
            pre.next = next;
            cur = next;
        }else{
            // pre指向当前节点，cur向后移动
            pre = cur;
            cur = cur.next;
        }
    }
    return first.next;  // 返回初始节点的后一个
}

5、反转链表（206. Reverse Linked List）

Input: 1->2->3->4->5->NULL
Output: 5->4->3->2->1->NULL
// 解法：  1、 null 1->2->3->null
//         2、 null<-1 2->3->null
//         3、 null<-1<-2 3->null
// 迭代法
public ListNode reverseList(ListNode head) {
    ListNode prev = null;
    while(head !=null){
        // 保存断开后的链接
        ListNode next = head.next;
        // 原地反转
        head.next = prev;
        prev = head;
        head = next;
    }
    return prev;
}
// 递归法
public ListNode reverseList(ListNode head) {
    if(head == null || head.next ==null) return head;
    ListNode node = reverseList(head.next);
    // 考虑第一个head，反转指针，指向null
    head.next.next = head;
    head.next = null;
    return node;
}

6、删除元素（27. Remove Element）

// Given nums = [3,2,2,3], val = 3,  return length=2. [2,2,x,x]
// Given nums = [0,1,2,2,3,0,4,2], val = 2,return length=5. [0,1,3,0,4]
// Method 1: 双指针，bengin 和 end
public int removeElement(int[] nums, int val) {
    int begin = 0;
    int end = nums.length - 1;
    while(begin <= end){
        if(nums[begin] == val){
            // 将end的值赋给begin，并且end--, 下次循环继续检查begin
            nums[begin] = nums[end--];
        }else{
            // 继续移动
            begin++;
        }
    }
    return begin;
}

// Method 2: 双指针，fast 和 slow
// fast每移动一个，如果 nums[fast]!=val,  nums[slow++] = nums[fast]
// 将不删除的元素放到他的位置上，留下的是什么元素不重要
public int removeElement(int[] nums, int val) {
    int slow = 0;
    for(int i=0;i<nums.length;i++){
        if(nums[i] != val){
            nums[slow++] = nums[i];
        }
    }
    return slow;
}

扩展，剑指offer（在 O(1) 时间内删除链表节点）

// 如果该节点不是尾节点，那么可以直接将下一个节点的值赋给该节点，然后令该节点指向下下个节点，再删除下一个节点，时间复杂度为 O(1)。
// 否则，就需要先遍历链表，找到节点的前一个节点，然后让前一个节点指向 null，时间复杂度为 O(N)。
public ListNode deleteNode(ListNode head, ListNode tobeDelete) {
    if(head == null || head.next == null || tobeDelete == null) return null;
    if(tobeDelete.next != null){
        ListNode next = tobeDelete.next;
        tobeDelete.val = next.val;
        tobeDelete.next = next.next;
    }else{
        ListNode cur = head;
        while(cur.next != null) cur = cur.next;
        cur.next = null;
    }
    return head;
}

7、删除链表中倒数第K个节点（19. Remove Nth Node From End of List）

// 思路比较简单，fast指针先走n步，然后快慢指针一起走
// 注意点，定义一个头节点指向head节点，从头结点开始循环
public ListNode removeNthFromEnd(ListNode head, int n) {
    ListNode start = new ListNode(-1);
    // 定义头节点，从头结点开始
    ListNode fast = start;
    ListNode slow = start;
    start.next = head;
    // 先走n步
    int count = 0;
    while(count < n && fast.next !=null){
        fast = fast.next;
        count++;
    }
    // 走到结尾
    while(fast.next !=null){
        fast = fast.next;
        slow = slow.next;
    }
    // 删除 slow后面的节点
    slow.next = slow.next.next;
    return start.next;
}

8、合并两个排好序的链表（21. Merge Two Sorted Lists）

// 思路比较简单，1、定义好开始节点
// 2、定义两个指针节点， l1 和 l2, 不断向后移动
// 3、走到结尾时，l1 或 l2 有剩余的节点，遍历完所有的节点
public ListNode mergeTwoLists(ListNode l1, ListNode l2) {
    if(l1 == null) return l2;
    if(l2 == null) return l1;
    // 步骤一
    ListNode head = null;
    if(l1.val<l2.val){
        head = l1;
        l1 = l1.next;
    }else{
        head = l2;
        l2 = l2.next;
    }
    // 步骤二
    ListNode first = head;
    while(l1 !=null && l2 !=null){
        if(l1.val < l2.val){
            head.next = l1;
            l1 = l1.next;
        }else{
            head.next = l2;
            l2 = l2.next;
        }
         head = head.next;
    }
    // 步骤三
    while(l1!=null){
        head.next = l1;
        head = head.next;
        l1 = l1.next;
    }
    while(l2!=null){
        head.next = l2;
        head = head.next;
        l2 = l2.next;
    }
    return first;
}

9、链表排序（148. Sort List）

// Method 1：归并排序，递归法，时间 O（NlgN）, 空间 O（lgN）
// 快慢指针找中点，递归两个子链表
class Solution {
    public ListNode sortList(ListNode head) {
        if(head == null || head.next == null) return head;
        ListNode slow = head;
        ListNode fast = head.next;    // 注意点，开始fast的位置
        while(fast != null && fast.next != null){
            fast = fast.next.next;
            slow = slow.next;
        }
        ListNode mid = slow.next;    // 注意点，终止，mid的计算方式
        slow.next = null;
        return merge(sortList(head), sortList(mid));  // 递归的方式
    }
    public ListNode merge(ListNode l1, ListNode l2){
        ListNode dummy = new ListNode(-1);    // 技巧，设定dummy节点
        ListNode tail = dummy;
        while(l1 !=null && l2 !=null){
            if(l1.val > l2.val ){
                tail.next = l2;
                l2 = l2.next;
            }else{
                tail.next = l1;
                l1 = l1.next;
            }
            tail = tail.next;
        }
        if(l1 != null) tail.next = l1;
        if(l2 != null) tail.next = l2;
        return dummy.next;
    }
}
// method 2: 归并排序，自底向上

10、归并排序，自顶向下，自底向上

/*
 * 归并排序
 * 递归格式要记住 
 */
public static void mergeSort(int [] arr, int low, int high){
	if(low<high){
		int mid = (low + high)/2;
		mergeSort(arr, low, mid);
		mergeSort(arr, mid+1, high);
		merge(arr, low, mid, high);
	}
}

/*
 * 归并排序, 自底向上，迭代法
 * 两轮循环，第一轮循环 ： 一次看多少数据  1, 2, 4, 8, ....
 * 第二轮迭代: 对连续两个子区间    进行归并排序  
 */
public static void mergeSortBU(int arr[], int n){
	for(int size=1; size <= n; size += size){
		for(int i=0; i+size < n; i += size + size){
			merge(arr, i, i+size-1, Math.min(i+size+size-1, n-1));
		}
	}
}

/*
 * 合并两个有序数组
 * 需要空间存储原始 数组
 */
public static void merge(int [] arr, int low, int mid, int high){
	int [] brr = Arrays.copyOf(arr, arr.length);
	int i=low,j=mid+1,k=i;
	for(;i<=mid && j<=high;){
		if(brr[i] < brr[j]){
			arr[k++] = brr[i++];
		}else{
			arr[k++] = brr[j++];
		}
	}
	while(i<=mid){
		arr[k++] = brr[i++]; 
	}
	while(j<=high){
		arr[k++] = brr[j++];
	}
}

11、全排列（46. Permutations）

Input: [1,2,3]
Output:
[
  [1,2,3],
  [1,3,2],
  [2,1,3],
  [2,3,1],
  [3,1,2],
  [3,2,1]
]
// 解法：
/*
 * 全排列   递归实现
 * 1 + 【后面的全排列】
 * 递归的实现这个过程
 * 递归出口就是 这个数组只剩下最后一个元素
 * 1，2，3，4，分别开头 ，然后交换，然后在交换回来
 * 1 【2,3,4】 交换 2【1,3,4】 3【2,1,4】 4【2,3,1】 递归实现
 */
 public List<List<Integer>> permute(int[] nums) {
    List<List<Integer>> list = new ArrayList<List<Integer>>();
    perm(nums, 0, nums.length-1, list);
    return list;
}
public static void perm(int[] nums, int l, int r, List<List<Integer>> list){
    if(l == r){
        List<Integer> li = genList(nums);   // 每次出口就是一个排列
        list.add(li);
    }else{
        for(int i=l;i<=r;i++){
            swap(nums, l, i);    // 交换
            perm(nums, l+1, r, list); 
            swap(nums, l, i);   // 交换回来
        }  
    } 
}
public static void swap(int[] nums, int i, int j){
    int temp = nums[i];
    nums[i] = nums[j];
    nums[j] = temp;
}
public static List<Integer> genList(int[] nums){
    List<Integer> li = new ArrayList<Integer>();
    for(int i=0;i<nums.length;i++){
        li.add(nums[i]);
    }
    return li;
}

李鹏伟

12 日志

1 分类

11 标签